Accelerating weighted random sampling without replacement

نویسنده

  • Kirill Müller
چکیده

Random sampling from discrete populations is one of the basic primitives in statistical computing. This article briefly introduces weighted and unweighted sampling with and without replacement. The case of weighted sampling without replacement appears to be most difficult to implement efficiently, which might be one reason why the R implementation performs slowly for large problem sizes. This paper presents four alternative implementations for the case of weighted sampling without replacement, with an analysis of their run time and correctness.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Weighted Random Sampling (2005; Efraimidis, Spirakis)

The problem of random sampling without replacement (RS) calls for the selection of m distinct random items out of a population of size n. If all items have the same probability to be selected, the problem is known as uniform RS. Uniform random sampling in one pass is discussed in [1, 5, 10]. Reservoir-type uniform sampling algorithms over data streams are discussed in [11]. A parallel uniform r...

متن کامل

Weighted Random Sampling over Data Streams

In this work, we present a comprehensive treatment of weighted random sampling (WRS) over data streams. More precisely, we examine two natural interpretations of the item weights, describe an existing algorithm for each case ([2,4]), discuss sampling with and without replacement and show adaptations of the algorithms for several WRS problems and evolving data streams.

متن کامل

Weighted Sampling Without Replacement from Data Streams

Weighted sampling without replacement has proved to be a very important tool in designing new algorithms. Efraimidis and Spirakis (IPL 2006) presented an algorithm for weighted sampling without replacement from data streams. Their algorithm works under the assumption of precise computations over the interval [0, 1]. Cohen and Kaplan (VLDB 2008) used similar methods for their bottom-k sketches. ...

متن کامل

A Direct Bootstrap Method for Complex Sampling Designs From a Finite Population

In complex designs, classical bootstrap methods result in a biased variance estimator when the sampling design is not taken into account. Resampled units are usually rescaled or weighted in order to achieve unbiasedness in the linear case. In the present article, we propose novel resampling methods that may be directly applied to variance estimation. These methods consist of selecting subsample...

متن کامل

Lattice Paths, Sampling without Replacement, and the Kernel Method

In this work we consider weighted lattice paths in the quarter plane N0 × N0. The steps are given by (m, n) → (m − 1, n), (m, n) → (m, n − 1) and are weighted as follows: (m, n)→ (m− 1, n) by m/(m + n) and step (m, n)→ (m, n− 1) by n/(m + n). The considered lattice paths are absorbed at lines y = x/t− s/t with t ∈ N and s ∈ N0. We provide explicit formulæ for the sum of the weights of paths, st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016